Goto

Collaborating Authors

 Šumadija District


NICE^k Metrics: Unified and Multidimensional Framework for Evaluating Deterministic Solar Forecasting Accuracy

arXiv.org Machine Learning

Accurate solar energy output prediction is key for integrating renewables into grids, maintaining stability, and improving energy management. However, standard error metrics such as Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Skill Scores (SS) fail to capture the multidimensional nature of solar irradiance forecasting. These metrics lack sensitivity to forecastability, rely on arbitrary baselines (e.g., clear-sky models), and are poorly suited for operational use. To address this, we introduce the NICEk framework (Normalized Informed Comparison of Errors, with k = 1, 2, 3, Sigma), offering a robust and interpretable evaluation of forecasting models. Each NICEk score corresponds to an Lk norm: NICE1 targets average errors, NICE2 emphasizes large deviations, NICE3 highlights outliers, and NICESigma combines all. Using Monte Carlo simulations and data from 68 stations in the Spanish SIAR network, we evaluated methods including autoregressive models, extreme learning, and smart persistence. Theoretical and empirical results align when assumptions hold (e.g., R^2 ~ 1.0 for NICE2). Most importantly, NICESigma consistently shows higher discriminative power (p < 0.05), outperforming traditional metrics (p > 0.05). The NICEk metrics exhibit stronger statistical significance (e.g., p-values from 10^-6 to 0.004 across horizons) and greater generalizability. They offer a unified and operational alternative to standard error metrics in deterministic solar forecasting.


On the Importance of Clearsky Model in Short-Term Solar Radiation Forecasting

arXiv.org Artificial Intelligence

Clearsky models are widely used in solar energy for many applications such as quality control, resource assessment, satellite-base irradiance estimation and forecasting. However, their use in forecasting and nowcasting is associated with a number of challenges. Synchronization errors, reliance on the Clearsky index (ratio of the global horizontal irradiance to its cloud-free counterpart) and high sensitivity of the clearsky model to errors in aerosol optical depth at low solar elevation limit their added value in real-time applications. This paper explores the feasibility of short-term forecasting without relying on a clearsky model. We propose a Clearsky-Free forecasting approach using Extreme Learning Machine (ELM) models. ELM learns daily periodicity and local variability directly from raw Global Horizontal Irradiance (GHI) data. It eliminates the need for Clearsky normalization, simplifying the forecasting process and improving scalability. Our approach is a non-linear adaptative statistical method that implicitely learns the irradiance in cloud-free conditions removing the need for an clear-sky model and the related operational issues. Deterministic and probabilistic results are compared to traditional benchmarks, including ARMA with McClear-generated Clearsky data and quantile regression for probabilistic forecasts. ELM matches or outperforms these methods, providing accurate predictions and robust uncertainty quantification. This approach offers a simple, efficient solution for real-time solar forecasting. By overcoming the stationarization process limitations based on usual multiplicative scheme Clearsky models, it provides a flexible and reliable framework for modern energy systems.


Towards Recommender Systems LLMs Playground (RecSysLLMsP): Exploring Polarization and Engagement in Simulated Social Networks

arXiv.org Artificial Intelligence

Given the exponential advancement in AI technologies and the potential escalation of harmful effects from recommendation systems, it is crucial to simulate and evaluate these effects early on. Doing so can help prevent possible damage to both societies and technology companies. This paper introduces the Recommender Systems LLMs Playground (RecSysLLMsP), a novel simulation framework leveraging Large Language Models (LLMs) to explore the impacts of different content recommendation setups on user engagement and polarization in social networks. By creating diverse AI agents (AgentPrompts) with descriptive, static, and dynamic attributes, we assess their autonomous behaviour across three scenarios: Plurality, Balanced, and Similarity. Our findings reveal that the Similarity Scenario, which aligns content with user preferences, maximizes engagement while potentially fostering echo chambers. Conversely, the Plurality Scenario promotes diverse interactions but produces mixed engagement results. Our study emphasizes the need for a careful balance in recommender system designs to enhance user satisfaction while mitigating societal polarization. It underscores the unique value and challenges of incorporating LLMs into simulation environments. The benefits of RecSysLLMsP lie in its potential to calculate polarization effects, which is crucial for assessing societal impacts and determining user engagement levels with diverse recommender system setups. This advantage is essential for developing and maintaining a successful business model for social media companies. However, the study's limitations revolve around accurately emulating reality. Future efforts should validate the similarity in behaviour between real humans and AgentPrompts and establish metrics for measuring polarization scores.


Towards New Benchmark for AI Alignment & Sentiment Analysis in Socially Important Issues: A Comparative Study of Human and LLMs in the Context of AGI

arXiv.org Artificial Intelligence

With the expansion of neural networks, such as large language models, humanity is exponentially heading towards superintelligence. As various AI systems are increasingly integrated into the fabric of societies-through recommending values, devising creative solutions, and making decisions-it becomes critical to assess how these AI systems impact humans in the long run. This research aims to contribute towards establishing a benchmark for evaluating the sentiment of various Large Language Models in socially importan issues. The methodology adopted was a Likert scale survey. Seven LLMs, including GPT-4 and Bard, were analyzed and compared against sentiment data from three independent human sample populations. Temporal variations in sentiment were also evaluated over three consecutive days. The results highlighted a diversity in sentiment scores among LLMs, ranging from 3.32 to 4.12 out of 5. GPT-4 recorded the most positive sentiment score towards AGI, whereas Bard was leaning towards the neutral sentiment. The human samples, contrastingly, showed a lower average sentiment of 2.97. The temporal comparison revealed differences in sentiment evolution between LLMs in three days, ranging from 1.03% to 8.21%. The study's analysis outlines the prospect of potential conflicts of interest and bias possibilities in LLMs' sentiment formation. Results indicate that LLMs, akin to human cognitive processes, could potentially develop unique sentiments and subtly influence societies' perceptions towards various opinions formed within the LLMs.


Evaluating Large Language Models Against Human Annotators in Latent Content Analysis: Sentiment, Political Leaning, Emotional Intensity, and Sarcasm

arXiv.org Artificial Intelligence

In the era of rapid digital communication, vast amounts of textual data are generated daily, demanding efficient methods for latent content analysis to extract meaningful insights. Large Language Models (LLMs) offer potential for automating this process, yet comprehensive assessments comparing their performance to human annotators across multiple dimensions are lacking. This study evaluates the reliability, consistency, and quality of seven state-of-the-art LLMs, including variants of OpenAI's GPT-4, Gemini, Llama, and Mixtral, relative to human annotators in analyzing sentiment, political leaning, emotional intensity, and sarcasm detection. A total of 33 human annotators and eight LLM variants assessed 100 curated textual items, generating 3,300 human and 19,200 LLM annotations, with LLMs evaluated across three time points to examine temporal consistency. Inter-rater reliability was measured using Krippendorff's alpha, and intra-class correlation coefficients assessed consistency over time. The results reveal that both humans and LLMs exhibit high reliability in sentiment analysis and political leaning assessments, with LLMs demonstrating higher internal consistency than humans. In emotional intensity, LLMs displayed higher agreement compared to humans, though humans rated emotional intensity significantly higher. Both groups struggled with sarcasm detection, evidenced by low agreement. LLMs showed excellent temporal consistency across all dimensions, indicating stable performance over time. This research concludes that LLMs, especially GPT-4, can effectively replicate human analysis in sentiment and political leaning, although human expertise remains essential for emotional intensity interpretation. The findings demonstrate the potential of LLMs for consistent and high-quality performance in certain areas of latent content analysis.


Decision-making algorithm based on the energy of interval-valued fuzzy soft sets

arXiv.org Artificial Intelligence

In our work, we continue to explore the properties of interval-valued fuzzy soft sets, which are obtained by combining interval-valued fuzzy sets and soft sets. We introduce the concept of energy of an interval-valued fuzzy soft set, as well as pessimistic and optimistic energy, enabling us to construct an effective decision-making algorithm. Through examples, the paper demonstrates how the introduced algorithm is successfully applied to problems involving uncertainty. Additionally, we compare the introduced method with other methods dealing with similar or related issues.


Amplification of Addictive New Media Features in the Metaverse

arXiv.org Artificial Intelligence

The emergence of the metaverse, envisioned as a hyperreal virtual universe facilitating boundless human interaction, stands to revolutionize our conception of media, with significant impacts on addiction, creativity, relationships, and social polarization. This paper aims to dissect the addictive potential of the metaverse due to its immersive and interactive features, scrutinize the effects of its recommender systems on creativity and social polarization, and explore potential consequences stemming from the metaverse development. We employed a literature review methodology, drawing parallels from the research on new media platforms and examining the progression of reality-mimicking features in media from historical perspectives to understand this transformative digital frontier. The findings suggest that these immersive and interactive features could potentially exacerbate media addiction. The designed recommender systems, while aiding personalization and user engagement, might contribute to social polarization and affect the diversity of creative output. However, our conclusions are based primarily on theoretical propositions from studies conducted on existing media platforms and lack empirical support specific to the metaverse. Therefore, this paper identifies a critical gap requiring further research, through empirical studies focused on metaverse use and addiction and exploration of privacy, security, and ethical implications associated with this burgeoning digital universe. As the development of the metaverse accelerates, it is incumbent on scholars, technologists, and policymakers to navigate its multilayered impacts thoughtfully to balance innovation with societal well-being.


GPT-4 Surpassing Human Performance in Linguistic Pragmatics

arXiv.org Artificial Intelligence

As Large Language Models (LLMs) become increasingly integrated into everyday life, their capabilities to understand and emulate human cognition are under steady examination. This study investigates the ability of LLMs to comprehend and interpret linguistic pragmatics, an aspect of communication that considers context and implied meanings. Using Grice's communication principles, LLMs and human subjects (N=76) were evaluated based on their responses to various dialogue-based tasks. The findings revealed the superior performance and speed of LLMs, particularly GPT4, over human subjects in interpreting pragmatics. GPT4 also demonstrated accuracy in the pre-testing of human-written samples, indicating its potential in text analysis. In a comparative analysis of LLMs using human individual and average scores, the models exhibited significant chronological improvement. The models were ranked from lowest to highest score, with GPT2 positioned at 78th place, GPT3 ranking at 23rd, Bard at 10th, GPT3.5 placing 5th, Best Human scoring 2nd, and GPT4 achieving the top spot. The findings highlight the remarkable progress made in the development and performance of these LLMs. Future studies should consider diverse subjects, multiple languages, and other cognitive aspects to fully comprehend the capabilities of LLMs. This research holds significant implications for the development and application of AI-based models in communication-centered sectors.


Polar Ducks and Where to Find Them: Enhancing Entity Linking with Duck Typing and Polar Box Embeddings

arXiv.org Artificial Intelligence

Entity linking methods based on dense retrieval are an efficient and widely used solution in large-scale applications, but they fall short of the performance of generative models, as they are sensitive to the structure of the embedding space. In order to address this issue, this paper introduces DUCK, an approach to infusing structural information in the space of entity representations, using prior knowledge of entity types. Inspired by duck typing in programming languages, we propose to define the type of an entity based on the relations that it has with other entities in a knowledge graph. Then, porting the concept of box embeddings to spherical polar coordinates, we propose to represent relations as boxes on the hypersphere. We optimize the model to cluster entities of similar type by placing them inside the boxes corresponding to their relations. Our experiments show that our method sets new state-of-the-art results on standard entity-disambiguation benchmarks, it improves the performance of the model by up to 7.9 F1 points, outperforms other type-aware approaches, and matches the results of generative models with 18 times more parameters.


From Zero to Hero: Harnessing Transformers for Biomedical Named Entity Recognition in Zero- and Few-shot Contexts

arXiv.org Artificial Intelligence

Supervised named entity recognition (NER) in the biomedical domain depends on large sets of annotated texts with the given named entities. The creation of such datasets can be time-consuming and expensive, while extraction of new entities requires additional annotation tasks and retraining the model. To address these challenges, this paper proposes a method for zero- and few-shot NER in the biomedical domain. The method is based on transforming the task of multi-class token classification into binary token classification and pre-training on a large amount of datasets and biomedical entities, which allow the model to learn semantic relations between the given and potentially novel named entity labels. We have achieved average F1 scores of 35.44% for zero-shot NER, 50.10% for one-shot NER, 69.94% for 10-shot NER, and 79.51% for 100-shot NER on 9 diverse evaluated biomedical entities with fine-tuned PubMedBERT-based model. The results demonstrate the effectiveness of the proposed method for recognizing new biomedical entities with no or limited number of examples, outperforming previous transformer-based methods, and being comparable to GPT3-based models using models with over 1000 times fewer parameters. We make models and developed code publicly available.